Skip to content

DO NOT MERGE: test dual-condition monitor with cache-only memory pressure#970

Closed
devin-ai-integration[bot] wants to merge 4 commits intodevin/1774478445-memory-failfastfrom
devin/1774888310-memory-failfast-feature-2
Closed

DO NOT MERGE: test dual-condition monitor with cache-only memory pressure#970
devin-ai-integration[bot] wants to merge 4 commits intodevin/1774478445-memory-failfastfrom
devin/1774888310-memory-failfast-feature-2

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot commented Mar 30, 2026

This PR targets the following PR:


Summary

Adds a synthetic cache memory pressure scenario in AirbyteEntrypoint.read() to test the negative case of the dual-condition memory monitor (introduced in #962). For each consumed record, a 10 MB file-backed mmap page is created on /tmp (overlay fs), which inflates cgroup memory usage without increasing the Python process's anonymous RSS (RssAnon).

This validates that the memory monitor correctly does not raise AirbyteTracedException when container memory is high but the pressure comes from file-backed/reclaimable pages rather than process-private anonymous memory.

DO NOT MERGE — test-only change to validate memory monitor behavior.

Companion PR: #969 (tests the positive case — process RSS growth triggers fail-fast)

Updates since last revision

  • Switched mmap backing store from /dev/shm (tmpfs) to /tmp (overlay fs) to fix SIGBUS (exit code 135). Docker containers typically cap /dev/shm at 64 MB, so after ~6 records the tmpfs filled up and the next mm.write() triggered a bus error before the memory monitor ever got a chance to evaluate. /tmp on overlay fs has no such hard limit, and file-backed mmap pages there still count toward cgroup memory as RssFile (not RssAnon), preserving the dual-condition test.

Review & Testing Checklist for Human

  • Verify that mmap pages backed by /tmp (overlay fs) actually inflate cgroup memory.current in the target deployment environment — overlay fs page cache accounting can vary by kernel version and storage driver
  • Deploy a connector using this branch in a memory-bounded container and confirm cgroup usage climbs toward the limit
  • Confirm that RssAnon in /proc/self/status remains low relative to the container limit
  • Verify the memory monitor logs the "pressure likely from file-backed or reclaimable pages — not raising" info message instead of raising an exception
  • Confirm the sync completes (or is killed by the OOM killer) without a graceful AirbyteTracedException shutdown

Notes

  • The mmap pages and file descriptors are intentionally never closed — the goal is to accumulate memory pressure throughout the sync
  • Uses tempfile.mktemp (which has a theoretical race condition) rather than tempfile.mkstemp; acceptable for a throwaway test branch
  • Each record creates a 10 MB mmap'd file on /tmp; memory growth rate depends on record throughput
  • If overlay fs pages are too aggressively reclaimed by the kernel and cgroup usage never reaches the 98% critical threshold, the test may not exercise the dual-condition code path — check the memory monitor warning logs to confirm cgroup usage is actually climbing
  • Pattern is the counterpart to DO NOT MERGE: test graceful OOM shutdown via intentional memory leak #969, which tests that process-private memory growth does trigger fail-fast

Link to Devin session: https://app.devin.ai/sessions/070ecb51ceee4f9189e1c09a83ba31cb

Co-Authored-By: patrick.nilan@airbyte.io <patrick.nilan@airbyte.io>
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link
Copy Markdown

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1774888310-memory-failfast-feature-2#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1774888310-memory-failfast-feature-2

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /prerelease - Triggers a prerelease publish with default arguments
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment
📚 Show Repo Guidance

Helpful Resources

📝 Edit this welcome message.

…-condition monitor

Co-Authored-By: patrick.nilan@airbyte.io <patrick.nilan@airbyte.io>
@devin-ai-integration devin-ai-integration bot changed the title feat(cdk): memory fail-fast feature branch 2 DO NOT MERGE: test dual-condition monitor with cache-only memory pressure Mar 30, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 30, 2026

PyTest Results (Fast)

3 948 tests   - 41   3 937 ✅  - 41   7m 45s ⏱️ -21s
    1 suites ± 0      11 💤 ± 0 
    1 files   ± 0       0 ❌ ± 0 

Results for commit 31be325. ± Comparison against base commit 3c7c755.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 30, 2026

PyTest Results (Full)

3 951 tests   - 41   3 939 ✅  - 41   11m 27s ⏱️ -7s
    1 suites ± 0      12 💤 ± 0 
    1 files   ± 0       0 ❌ ± 0 

Results for commit 31be325. ± Comparison against base commit 3c7c755.

♻️ This comment has been updated with latest results.

@pnilan
Copy link
Copy Markdown
Contributor

Patrick Nilan (pnilan) commented Mar 30, 2026

/prerelease

Prerelease Job Info

This job triggers the publish workflow with default arguments to create a prerelease.

Prerelease job started... Check job output.

✅ Prerelease workflow triggered successfully.

View the publish workflow run: https://github.com/airbytehq/airbyte-python-cdk/actions/runs/23757385016

…GBUS

/dev/shm is a tmpfs with a hard size limit (typically 64 MB in Docker
containers). After ~6 records at 10 MB each, the filesystem fills up
and the next mm.write() triggers SIGBUS (exit code 135).

Switching to /tmp (overlay fs) avoids the size limit. File-backed mmap
pages on overlay fs still count toward cgroup memory and go into
RssFile (not RssAnon), preserving the dual-condition test.

Co-Authored-By: patrick.nilan@airbyte.io <patrick.nilan@airbyte.io>
@pnilan

This comment was marked as outdated.

@pnilan
Copy link
Copy Markdown
Contributor

Patrick Nilan (pnilan) commented Mar 31, 2026

/prerelease

Prerelease Job Info

This job triggers the publish workflow with default arguments to create a prerelease.

Prerelease job started... Check job output.

✅ Prerelease workflow triggered successfully.

View the publish workflow run: https://github.com/airbytehq/airbyte-python-cdk/actions/runs/23821153961

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

DO NOT MERGE — test dual-condition monitor with cache-only memory pressure

This branch creates file-backed mmap pages on /tmp (overlay fs) for each consumed record, inflating cgroup memory usage without growing process anonymous RSS. This tests the negative case: the memory monitor should not raise an exception when memory pressure comes from reclaimable file-backed pages rather than process heap allocations.

Expected behavior: The memory monitor should detect high cgroup usage but low anonymous memory share, log "pressure likely from file-backed or reclaimable pages", and allow the connector to continue running normally.

CDK dev version: 7.13.0.post16.dev23821147586
Source-faker prerelease: publishing from PR #75610 (pinned to actor 2126fe35-fd02-4b6c-a207-1976da5c7156 in @devin-ai-sandbox)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant